Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 2 de 2
Filter
Add filters

Database
Language
Document Type
Year range
1.
Stud Health Technol Inform ; 302: 302-306, 2023 May 18.
Article in English | MEDLINE | ID: covidwho-2327301

ABSTRACT

Contradictions as a data quality indicator are typically understood as impossible combinations of values in interdependent data items. While the handling of a single dependency between two data items is well established, for more complex interdependencies, there is not yet a common notation or structured evaluation method established to our knowledge. For the definition of such contradictions, specific biomedical domain knowledge is required, while informatics domain knowledge is responsible for the efficient implementation in assessment tools. We propose a notation of contradiction patterns that reflects the provided and required information by the different domains. We consider three parameters (α, ß, θ): the number of interdependent items as α, the number of contradictory dependencies defined by domain experts as ß, and the minimal number of required Boolean rules to assess these contradictions as θ. Inspection of the contradiction patterns in existing R packages for data quality assessments shows that all six examined packages implement the (2,1,1) class. We investigate more complex contradiction patterns in the biobank and COVID-19 domains showing that the minimum number of Boolean rules might be significantly lower than the number of described contradictions. While there might be a different number of contradictions formulated by the domain experts, we are confident that such a notation and structured analysis of the contradiction patterns helps to handle the complexity of multidimensional interdependencies within health data sets. A structured classification of contradiction checks will allow scoping of different contradiction patterns across multiple domains and effectively support the implementation of a generalized contradiction assessment framework.


Subject(s)
COVID-19 , Data Accuracy , Humans
2.
Methods Inf Med ; 62(S 01): e47-e56, 2023 Jun.
Article in English | MEDLINE | ID: covidwho-2237390

ABSTRACT

BACKGROUND: As a national effort to better understand the current pandemic, three cohorts collect sociodemographic and clinical data from coronavirus disease 2019 (COVID-19) patients from different target populations within the German National Pandemic Cohort Network (NAPKON). Furthermore, the German Corona Consensus Dataset (GECCO) was introduced as a harmonized basic information model for COVID-19 patients in clinical routine. To compare the cohort data with other GECCO-based studies, data items are mapped to GECCO. As mapping from one information model to another is complex, an additional consistency evaluation of the mapped items is recommended to detect possible mapping issues or source data inconsistencies. OBJECTIVES: The goal of this work is to assure high consistency of research data mapped to the GECCO data model. In particular, it aims at identifying contradictions within interdependent GECCO data items of the German national COVID-19 cohorts to allow investigation of possible reasons for identified contradictions. We furthermore aim at enabling other researchers to easily perform data quality evaluation on GECCO-based datasets and adapt to similar data models. METHODS: All suitable data items from each of the three NAPKON cohorts are mapped to the GECCO items. A consistency assessment tool (dqGecco) is implemented, following the design of an existing quality assessment framework, retaining their-defined consistency taxonomies, including logical and empirical contradictions. Results of the assessment are verified independently on the primary data source. RESULTS: Our consistency assessment tool helped in correcting the mapping procedure and reveals remaining contradictory value combinations within COVID-19 symptoms, vital signs, and COVID-19 severity. Consistency rates differ between the different indicators and cohorts ranging from 95.84% up to 100%. CONCLUSION: An efficient and portable tool capable of discovering inconsistencies in the COVID-19 domain has been developed and applied to three different cohorts. As the GECCO dataset is employed in different platforms and studies, the tool can be directly applied there or adapted to similar information models.


Subject(s)
COVID-19 , Data Accuracy , Humans , Consensus , Pandemics , Quality Indicators, Health Care , COVID-19/epidemiology , Data Collection
SELECTION OF CITATIONS
SEARCH DETAIL